Estimating the Statistical Significance of Classifiers used in the Prediction of Tuberculosis
نویسنده
چکیده
Tuberculosis (TB) is a disease caused by bacteria called Mycobacterium Tuberculosis. It usually spreads through the air and attacks low immune bodies. Human Immuno deficiency Virus (HIV) patients are more likely to be attacked with TB. It is an important health problem in India as well. Diagnosis of pulmonary tuberculosis has always been a problem. Classification in medicine is an important task in the prediction of any disease. It even helps doctors in their diagnosis decisions. However the decision of best classification cannot just depend on accuracies or error rates. There is a need for critical statistical analysis of these classifiers based on some statistical tests. In this paper, a study on classification of Tuberculosis with statistical significance is realized at two stages. First stage is the comparison of accuracies by classifying TB data into two categories Pulmonary Tuberculosis(PTB) and retroviral PTB(RPTB) ie TB along with AIDS using basic learning classifiers such as C4.5 Decision Tree, Support Vector Machines (SVM), K-nearest neighbor, Bagging and Naïve Bayesian algorithms. Second stage is evaluating the performance of these classifiers using paired ttest to select the optimum model. Results for our datasets show that SVM and C4.5 Decision Tree are not statistically significant, whereas SVM with Naïve Bayes and K-nearest neighbor are statistically significant.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملتشخیص آریتمی انقباضات زودرس بطنی در سیگنال الکتریکی قلب با استفاده ازترکیب طبقهبندها
Cardiovascular diseases are the most dangerous diseases and one of the biggest causes of fatality all over the world. One of the most common cardiac arrhythmias which has been considered by physicians is premature ventricular contraction (PVC) arrhythmia. Detecting this type of arrhythmia due to its abundance of all ages, is particularly important. ECG signal recording is a non-invasive, popula...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملEvaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملPrediction of scour dimension in the Plunge Pools below Outlet Bucket with Artificial intelligence method
Accurate prediction of sediment scour hole dimensions downstream of hydraulic structures, e.g. the outlet bucket, is a complex and not straight forward engineering problem encountered worldwide. Because of the complexities of the study, its comprehensive, simultaneous including water flow, sediment and applying all of the effective variables involved in scouring it is not easy possible. Dimens...
متن کامل